-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quote shell command correctly for Posix shells #13214
Conversation
@@ -81,7 +81,7 @@ function shell_parse(raw::AbstractString, interp::Bool) | |||
if done(s,k) | |||
error("unterminated double quote") | |||
end | |||
if s[k] == '"' || s[k] == '$' | |||
if s[k] == '"' || s[k] == '$' || s[k] == '\\' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change to shell_parse
isn't mentioned in the commit message. Intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry. Yes, this is intentional (see the "parse" in the title). Currently, quoted backslashes in double quotes are not correctly interpreted; they result in two backslashes: The string "\\"
is parsed to yield \\
instead of \
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example? That should really be a separate bug fix with a standalone test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Especially since that fix will need to be incorporated into @nolta's port of shell_parse
to Scheme.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The example is just this string, "\\"
. I added test cases to spawn.jl
as part of the patch. @nolta's port already has this change, if I read the Lisp correctly.
If you want, I can split this patch into two...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get this:
julia> Base.shell_parse("\\")
ERROR: dangling backslash
in shell_parse at ./shell.jl:63
in shell_parse at ./shell.jl:117
What do you expect to get?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was talking about the string "\\"
, which is represented as "\"\\\\\""
when input as Julia source. This string should lead to a single backslash. See e.g. in a shell:
echo "\\"
\
The first backslash escapes the second (this still happens inside double quotes), and shell_parse
strips quotes around arguments.
In Julia:
str="\"\\\\\""
println(str)
Base.shell_split(str)
["\\"]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the same thing – both produce a string containing one backslash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I type this into a current Julia, I receive an error instead of the last line. (You asked for the "expected" output.) The test cases I added to spawn.jl
also report a disagreement where parsing yields too many backslashes -- sorry, I don't have the source string handy, it was more complex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a strict bug fix and unrelated to the rest of this change, so I've pulled it out and applied it here: cd59221.
So the main things that give me pause here are the increased complexity and the dependence on regular expressions for something so low level. In general, we try to avoid using regexes if possible since it would be nice someday for some minimal subset of Julia not to depend on PCRE. But maybe that is a quixotic endeavor. @JeffBezanson, what do you think about the increased regex usage here? |
The complexity comes from trying to create string representations that are both correct and "nice". If we split this across two functions, then we can (a) keep the original function for "niceness" and (b) add a new five-line function that doesn't use regexps for correctness. Also, the regexps are straightforward and could easily be replaced with a direct implementation. However, this would be significantly longer (maybe three or five times). |
Am I correct that this is really just expanding the set of characters that are considered "special" relative to the original implementation? |
Yes, in principle this just switches from treating just a few characters as special, to treating them all as special and using a white list for non-special characters. Of course, the detail about how to decide which way to quote is different. I took an existing, well-tested Python implementation and converted it to Julia. Apart from translation errors the regexps should be correct. |
The quoting is now safe for Posix sh-like shells. The algorithm tries several ways to quote words (via single or double quotes or backslashes), and then uses a scoring method to choose the "best" result. The current scoring chooses the shortest string, and adds additional penalties for backslashes.
bf2f2d1
to
e92664d
Compare
This patch is ready to be applied, but we should probably wait until after #13215 has been applied. Currently, the REPL shell mode uses this function (incorrectly assuming it does full Posix shell quoting), so this patch would currently have user-visible effects. I assume people are currently relying on the fact that such quoting doesn't happen, as this is what e.g. currently allows file redirection and pipelines there. |
Superseded by #30636 (which added a "shell_escape_posixly" function that ensures quoting is safe for any posix shell, vs. the "shell_escape" function which pretty-prints Julia cmd syntax) |
The quoting is now safe for Posix sh-like shells.
The algorithm tries several ways to quote words (via single or double quotes or backslashes), and then uses a scoring method to choose the "best" result. The current scoring chooses the shortest string, and adds additional penalties for backslashes.